AITopics | training level

Collaborating Authors

training level

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

_NeurIPS_2022__On_the_Effectiveness_of_Fine_tuning_Versus_Meta_reinforcement_Learning (1)

Mandi Zhao

Neural Information Processing SystemsFeb-19-2026, 09:45:44 GMT

Do the main claims made in the abstract and introduction accurately reflect the paper's contributions and If you ran experiments... (a) Did you specify all the training details (e.g., data splits, hyperparameters, how they were chosen)? Please refer to both main text and appendix for experiment details. Did you report error bars (e.g., with respect to the random seed after running experiments multiple All adaptation experiments in Procgen and RLBench are run for 3 seeds. Did you include the total amount of compute and the type of resources used (e.g., type of GPUs, internal As stated in section 2, we use RTX A5000 GPUs each with 24GB memory. C2F-ARM algorithm and training framework are built based on the original author's implementation Did you mention the license of the assets?

artificial intelligence, experiment, machine learning, (13 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.48)

Add feedback

SupplementaryMaterialforRethinkingValue FunctionLearningforGeneralizationin ReinforcementLearning

Neural Information Processing SystemsFeb-12-2026, 10:27:47 GMT

Then,wecalculatethe mean stiffness of the value network across all state pairs and report its average computed over all trainingepochs. Eachagentis trained on 200 training levels for 25M environment steps. The mean and standard deviation are computedover10differentruns. Morespecifically,wecollect100 training episodes throughout the training and evaluate the value network prediction for the initial stateofeachtrajectory. Each agent is trained on 200 training levels for 25M environment steps.

machine learning, optimizevalueobjectivejv, reinforcement learning, (18 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.65)

Add feedback

Instance-based Generalization in Reinforcement Learning

Neural Information Processing SystemsDec-24-2025, 05:54:01 GMT

Agents trained via deep reinforcement learning (RL) routinely fail to generalize to unseen environments, even when these share the same underlying dynamics as the training levels. Understanding the generalization properties of RL is one of the challenges of modern machine learning. Towards this goal, we analyze policy learning in the context of Partially Observable Markov Decision Processes (POMDPs) and formalize the dynamics of training levels as instances. We prove that, independently of the exploration strategy, reusing instances introduces significant changes on the effective Markov dynamics the agent observes during training. Maximizing expected rewards impacts the learned belief state of the agent by inducing undesired instance-specific speed-running policies instead of generalizable ones, which are sub-optimal on the training set. We provide generalization bounds to the value gap in train and test environments based on the number of training instances, and use insights based on these to improve performance on unseen levels. We propose training a shared belief representation over an ensemble of specialized policies, from which we compute a consensus policy that is used for data collection, disallowing instance-specific exploitation. We experimentally validate our theory, observations, and the proposed computational solution over the CoinRun benchmark.

instance-based generalization, name change, reinforcement learning, (4 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)

Add feedback

Supplementary Material for Rethinking Value Function Learning for Generalization in Reinforcement Learning A Stiffness Analysis

Neural Information Processing SystemsAug-19-2025, 13:06:37 GMT

The green lines in Figure 1 demonstrate that the stiffness decreases as the number of training levels increases in most of the Procgen games. This suggests that the delayed critic update effectively alleviates the memorization problem. Each agent is trained on 200 training levels for 25M environment steps. Each agent is trained for 8M environment steps. The mean is computed over 10 different runs.

artificial intelligence, machine learning, reinforcement learning, (16 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.65)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Add feedback

e19ab2dde2e60cf68d1ded18c38938f4-Paper-Conference.pdf

Neural Information Processing SystemsAug-19-2025, 13:06:34 GMT

artificial intelligence, machine learning, reinforcement learning, (17 more...)

Neural Information Processing Systems

Country:

North America > United States > Massachusetts > Middlesex County > Belmont (0.04)
Asia > South Korea > Seoul > Seoul (0.04)

Genre: Research Report (0.68)

Industry:

Health & Medicine (0.46)
Government (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Wavelet Flow For Extragalactic Foreground Simulations

Mebratu, M., Wu, W. L. K.

arXiv.org Artificial IntelligenceAug-19-2025

Extragalactic foregrounds in cosmic microwave background (CMB) observations are both a source of cosmological and astrophysical information and a nuisance to the CMB. Effective field-level modeling that captures their non-Gaussian statistical distributions is increasingly important for optimal information extraction, particularly given the precise and low-noise observations from current and upcoming experiments. We explore the use of Wavelet Flow (WF) models to tackle the novel task of modeling the field-level probability distributions of multi-component CMB secondaries and foreground. Specifically, we jointly train correlated CMB lensing convergence ($κ$) and cosmic infrared background (CIB) maps with a WF model and obtain a network that statistically recovers the input to high accuracy -- the trained network generates samples of $κ$ and CIB fields whose average power spectra are within a few percent of the inputs across all scales, and whose Minkowski functionals are similarly accurate compared to the inputs. Leveraging the multiscale architecture of these models, we fine-tune both the model parameters and the priors at each scale independently, optimizing performance across different resolutions. These results demonstrate that WF models can accurately simulate correlated components of CMB secondaries, supporting improved analysis of cosmological data. Our code and trained models can be found here (https://github.com/matiwosm/HybridPriorWavletFlow.git).

artificial intelligence, coefficient, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2505.2122

Country: North America > United States > California (0.28)

Genre: Research Report > New Finding (1.00)

Industry:

Energy (0.94)
Government > Regional Government (0.46)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

A Classification

Neural Information Processing SystemsAug-17-2025, 16:05:02 GMT

The RL image classification environment consists of a dataset of labelled images. For the variant labelled "Adaptive", we train a classifier In this section, we will derive the optimal memoryless policy. M: it receives the highest expected test-time return amongst all possible policies. This proposition follows directly from the definition of the epistemic POMDP . In both MDPs, the reward for the "stay" action is always zero.

artificial intelligence, epistemic pomdp, machine learning, (17 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.38)
Information Technology > Artificial Intelligence > Vision > Image Understanding (0.34)

Add feedback

Low-Bit Quantization Favors Undertrained LLMs: Scaling Laws for Quantized LLMs with 100T Training Tokens

Ouyang, Xu, Ge, Tao, Hartvigsen, Thomas, Zhang, Zhisong, Mi, Haitao, Yu, Dong

arXiv.org Artificial IntelligenceNov-26-2024

We reveal that low-bit quantization favors undertrained large language models (LLMs) by observing that models with larger sizes or fewer training tokens experience less quantization-induced degradation (QiD) when applying low-bit quantization, whereas smaller models with extensive training tokens suffer significant QiD. To gain deeper insights into this trend, we study over 1500 quantized LLM checkpoints of various sizes and at different training levels (undertrained or fully trained) in a controlled setting, deriving scaling laws for understanding the relationship between QiD and factors such as the number of training tokens, model size and bit width. With the derived scaling laws, we propose a novel perspective that we can use QiD to measure an LLM's training levels and determine the number of training tokens required for fully training LLMs of various sizes. Moreover, we use the scaling laws to predict the quantization performance of different-sized LLMs trained with 100 trillion tokens. Our projection shows that the low-bit quantization performance of future models, which are expected to be trained with over 100 trillion tokens, may NOT be desirable. This poses a potential challenge for low-bit quantization in the future and highlights the need for awareness of a model's training level when evaluating low-bit quantization research. To facilitate future research on this problem, we release all the 1500+ quantized checkpoints used in this work at https://huggingface.co/Xu-Ouyang.

large language model, natural language, quantization, (14 more...)

arXiv.org Artificial Intelligence

2411.17691

Country:

North America > United States > Virginia (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report (0.65)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback

Leveraging Conversational Generative AI for Anomaly Detection in Digital Substations

Zaboli, Aydin, Choi, Seong Lok, Hong, Junho

arXiv.org Artificial IntelligenceNov-9-2024

This study addresses critical challenges of cybersecurity in digital substations by proposing an innovative task-oriented dialogue (ToD) system for anomaly detection (AD) in multicast messages, specifically, generic object oriented substation event (GOOSE) and sampled value (SV) datasets. Leveraging generative artificial intelligence (GenAI) technology, the proposed framework demonstrates superior error reduction, scalability, and adaptability compared with traditional human-in-the-loop (HITL) processes. Notably, this methodology offers significant advantages over machine learning (ML) techniques in terms of efficiency and implementation speed when confronting novel and/or unknown cyber threats, while also maintaining model complexity and precision. The research employs advanced performance metrics to conduct a comparative assessment between the proposed AD and HITL-based AD frameworks, utilizing a hardware-in-the-loop (HIL) testbed for generating and extracting features of IEC61850 communication messages. This approach presents a promising solution for enhancing the reliability of power system operations in the face of evolving cybersecurity challenges.

anomaly, data mining, machine learning, (19 more...)

arXiv.org Artificial Intelligence

2411.16692

Country: